robust k-means
Robust k-means: a Theoretical Revisit
Over the last years, many variations of the quadratic k-means clustering procedure have been proposed, all aiming to robustify the performance of the algorithm in the presence of outliers. In general terms, two main approaches have been developed: one based on penalized regularization methods, and one based on trimming functions. In this work, we present a theoretical analysis of the robustness and consistency properties of a variant of the classical quadratic k-means algorithm, the robust k-means, which borrows ideas from outlier detection in regression. We show that two outliers in a dataset are enough to breakdown this clustering procedure. However, if we focus on "well-structured" datasets, then robust k-means can recover the underlying cluster structure in spite of the outliers. Finally, we show that, with slight modifications, the most general non-asymptotic results for consistency of quadratic k-means remain valid for this robust variant.
Reviews: Robust k-means: a Theoretical Revisit
In this paper the author studied theoretic properties of the robust k-means (RKM) formulation proposed in [5,23]. They first studied the robustness property, showing that if the f_\lambda function is convex, the one outlier is sufficient to break down the algorithm; and if f_\lambda need not be convex, then two outliers can breakdown the algorithm. On the other hand, under some structural assumptions on the non-outliers, then a non-trivial breakdown point can be established for RKM. The authors then study the consistency issue, generalising consistency results that are known for convex f_lambda to non convex f_\lambda. My main concern of the paper is that the results appear very specific and I am not entirely sure whether they will appeal to a more general audience in machine learning.
Robust k-means: a Theoretical Revisit
Over the last years, many variations of the quadratic k-means clustering procedure have been proposed, all aiming to robustify the performance of the algorithm in the presence of outliers. In general terms, two main approaches have been developed: one based on penalized regularization methods, and one based on trimming functions. In this work, we present a theoretical analysis of the robustness and consistency properties of a variant of the classical quadratic k-means algorithm, the robust k-means, which borrows ideas from outlier detection in regression. We show that two outliers in a dataset are enough to breakdown this clustering procedure. However, if we focus on "well-structured" datasets, then robust k-means can recover the underlying cluster structure in spite of the outliers.
Robust k-means: a Theoretical Revisit
Over the last years, many variations of the quadratic k-means clustering procedure have been proposed, all aiming to robustify the performance of the algorithm in the presence of outliers. In general terms, two main approaches have been developed: one based on penalized regularization methods, and one based on trimming functions. In this work, we present a theoretical analysis of the robustness and consistency properties of a variant of the classical quadratic k-means algorithm, the robust k-means, which borrows ideas from outlier detection in regression. We show that two outliers in a dataset are enough to breakdown this clustering procedure. However, if we focus on “well-structured” datasets, then robust k-means can recover the underlying cluster structure in spite of the outliers. Finally, we show that, with slight modifications, the most general non-asymptotic results for consistency of quadratic k-means remain valid for this robust variant.